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Abstract — Generating random bits from a source of biased 
coins (the biased is unknown) is a classical question that was 
originally studied by von Neumann. There are a number of 
known algorithms that have asymptotically optimal information 
efficiency, namely, the expected number of generated random 
bits per input bit is asymptotically close to the entropy of the 
source. However, only the original von Neumann algorithm has 
a 'streaming property' - it operates on a single input bit at a 
time and it generates random bits when possible, alas, it does 
not have an optimal information efficiency. 

The main contribution of this paper is an algorithm that 
generates random bit streams from biased coins, uses bounded 
space and runs in expected linear time. As the size of the allot- 
ted space increases, the algorithm approaches the information- 
theoretic upper bound on efficiency. In addition, we discuss how 
to extend this algorithm to generate random bit streams from 
m-sided dice or correlated sources such as Markov chains. 

Index Terms — Random Number Generation, Biased Coins, 
Markov Chains, Streams. 



I. Introduction 

THE question of generating random bits from a source 
of biased coins dates back to von Neumann ||8| who 
observed that when one focuses on a pair of coin tosses, 
the events HT and TH have the same probabiHty (H is for 
'head' and T is for 'tail') of being generated; hence, HT 
produces the output symbol 1 and TH produces the output 
symbol 0. The other two possible events, namely, HH and 
TT, are ignored, namely, they do not produce any output 
symbols. However, von Neumann's algorithm is not optimal 
in terms of the number of random bits that are generated. 
This problem was solved, specifically, given a fixed number of 
biased coin tosses with unknown probability, it is well known 
how to generate random bits with asymptotically optimal 
efficiency, namely, the expected number of unbiased random 
bits generated per coin toss is asymptotically equal to the 
entropy of the biased coin Ill-llH. However, these solutions, 
including Elias's algorithm and Peres's algorithm, can generate 
random bits only after receiving the complete input sequence 
(or a fixed number of input bits), and the number of random 
bits generated is a random variable. 

We consider the setup of generating a "stream" of random 
bits; that is, whenever random bits are required, the algorithm 
reads new coin tosses and generates random bits dynamically. 
Our new streaming algorithm is more efficient (in the number 
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of input bits, memory and time) for producing the required 
number of random bits and is a better choice for implementa- 
tion in practical systems. We notice that von Neumann scheme 
is the one which is able to generate a stream of random bits, 
but its efficiency is far from optimal. Our goal is to modify this 
scheme such that it can achieve the information-theoretic upper 
bound on efficiency. Specifically, we would like to construct a 
function / : {H, T}* {0, 1}* which satisfies the following 
conditions: 

• / generates a stream. For any two sequences of coin 
tosses x,y ^ {H, T}*, f{x) is a prefix of f{xy). 

• / generates random bits. Let Xk G {0, 1}* be the se- 
quence of coin tosses inducing k bits; that is, \f{Xk)\ > k 
but for any strict prefix X of Xk, |/(-'^)| < k. Then the 
first k bits of f{Xk) are independent and unbiased. 

• / has asymptotically optimal efficiency. That is, for any 
fc > 0, 
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oo, where H{p) is the entropy of the biased coin 



We note that the von Neumann scheme uses only 3 states, 
i.e., a symbol in {(/i, H,T}, for storing state information. For 
example, the output bit is 1 if and only if the current state is 
H and the input symbol is T. In this case, the new state is 
Similarly, the output bit is if and only if the current state is 
T and the input symbol is H. In this case, the new state is cf). 
Our approach for generalizing von Neumann's scheme is by 
increasing the memory (or state) of our algorithm such that 
we do not lose information that might be useful for generating 
future random bits. We represent the state information as a 
binary tree, called status tree, in which each node is labeled by 
a symbol in {0, H, T, 0, 1}. When a source symbol (a coin toss) 
is received, we modify the status tree based on certain simple 
rules and generate random bits in a dynamic way. This is the 
key idea in our algorithm; we call this approach the random- 
stream algorithm. In some sense, the random-stream algorithm 
is the streaming version of Peres's algorithm. We show that 
this algorithm satisfies all three conditions above, namely, it 
can generate a stream of random bits with asymptotically 
optimal efficiency. In practice, we can reduce the space size 
by limiting the depth of the status tree. We will demonstrate 
that as the depth of the status tree increases, the efficiency of 
the algorithm quickly converges to the information-theoretic 
upper bound. 

An extension of the question is to generate random bits 
or random-bit streams from an arbitrary Markov chain with 
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unknown transition probabilities. This problem was first stud- 
ied by Samuelson |71, and his algorithm was later improved 
by Blum |1 1. Recently, we proposed the first known algorithm 
that runs in expected Unear time and achieves the information- 
theoretic upper bound on efficiency f9l. In this paper, we 
briefly introduce the techniques of generating random-bit 
streams from Markov chains. 

The rest of the paper is organized as follows. Section |ll] 
presents our key result, the random-stream algorithm that 
generates random bit streams from arbitrary biased coins and 
achieves the information-theoretic upper bound on efficiency. 
In Section Hill we generalize the random-stream algorithm to 
generate random bit streams from a source of a larger alphabet. 
An extension for Markov chains is provided in Section IIVI 
followed by the concluding remarks. 

II. The Random-Stream Algorithm 

A. Description 

Many algorithms have been proposed for efficiently gen- 
erating random bits from a fixed number of coins tosses, 
including Elias's algorithm and Peres's algorithm. However, 
in these algorithms, the input bits can be processed only 
after all of them have been received, and the number of 
random bits generated cannot be controlled. In this section, 
we focus on deriving a new algorithm, the random-stream 
algorithm, that generates a stream of random bits from an 
arbitrary biased-coin source and achieves the information- 
theoretic upper bound on efficiency. Given an application 
that requires random bits, the random-stream algorithm can 
generate random bits dynamically based on requests from the 
application. 

While von Neumann's scheme can generate a stream of 
random bits from an arbitrary biased coin, its efficiency is far 
from being optimal. The main reason is that it uses minimal 
state information, recorded by a symbol of alphabet size three 
in {(f), H, T}. The key idea in our algorithm is to create a binary 
tree for storing the state information, called a status tree. A 
node in the status tree stores a symbol in {(f),H,T,0, 1}. The 
following procedure shows how the status tree is created and 
is dynamically updated in response to arriving input bits. At 
the beginning, the tree has only a single root node labeled as 
(j). When reading a coin toss from the source, we modify the 
status tree based on certain rules. For each node in the status 
tree, if it receives a message (H or T), we do operations on 
the node. Meanwhile, this node may pass some new messages 
to its children. Iteratively, we can process the status tree until 
no more messages are generated. Specifically, let w be a node 
in the tree. Assume the label of u is a; G {(f), H,T, 1, 0} and it 
receives a symbol y e {H, T} from its parent node (or from 
the source if u is the root node). Depending on the values of 
X and y, we do the following operations on node u. 

1) When X — (p, set X ^ y. 

2) When a; = 1 or 0, output x and set x = y. 

3) When x = H or T, we first check whether u has children. 
If it does not have, we create two children with label 
for it. Let ui and Ur denote the two children of u. 



« If xy — HH, we set x — (j), then pass a symbol T 
to ui and a symbol H to u^- 

• If xy = TT, we set x = (j), then pass a symbol T to 
ui and a symbol T to m^. 

• If = HT, we set x = 1, then pass a symbol H 

to Ui. 

• If = TH, we set x = {), then pass a symbol H 

to Ul. 

We see that the node u passes a symbol x + y mod 2 
to its left child and if x = y it passes a symbol x to its 
right child. 

Note that the timing is crucial that we output a node's label 
(when it is 1 or 0) only after it receives the next symbol 
from its parent or from the source. This is different from von 
Neumann's scheme where a 1 or a is generated immediately 
without waiting for the next symbol. If we only consider the 
output of the root node in the status tree, then it is similar 
to von Neumann's scheme. And the other nodes correspond 
to the information discarded by von Neumann's scheme. In 
some sense, the random-stream algorithm can be treated as a 
"stream" version of Peres's algorithm. The following example 
is constructed for the purpose of demonstration. 

Example 1. Assume we have a biased coin and our random- 
ized application requires 2 random bits. Fig. Q] illustrates how 
the random-stream algorithm works when the incoming stream 
is HTTTHT... In this figure, we can see the changes of the 
status tree and the messages (symbols) passed throughout the 
tree for each step. We see that the output stream is 11... 

Lemma 1. Let X be the current input sequence and let T 
be the current status tree. Given T and the bits generated by 
each node in T, we can reconstruct X uniquely. 

Proof: Let us prove this lemma by induction. If the 
maximum depth of the status tree is 0, it has only a single 
node. In this case, X is exactly the label on the single node. 
Hence the conclusion is trivial. Now we show that if the 
conclusion holds for all status trees with maximum depth at 
most k, then it also holds for all status trees with maximum 
depth fc + 1. 

Given a status tree T with maximum depth fc + 1, we let 
Y G {0, 1}* denote the binary sequence generated by the 
root node, and L,R £ {H, T}* are the sequences of symbols 
received by its left child and right child. If the label of the root 
node is in {0, 1}, we add it to Y. According to the random- 
stream algorithm, it is easy to get that 

\L\^\Y\ + \R\. 

Based on our assumption, L, R can be constructed from the 
left and right subtrees and the bits generated by each node 
in the subtree since their depths are at most fc. We show that 
once L,R,Y satisfy the equality above, the input sequence X 
can be uniquely constructed from L,R,Y and a, where a is 
the label of the root node. The procedure is as follows: Let 
us start from an empty string for X and read symbols from L 
sequentially. If a symbol read from L is H, we read a bit from 
Y. If this bit is 1 we add HT to X, otherwise we add TH to 
X. If a symbol read from L is T, we read a symbol (H or T) 
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Fig. 1. An instance for generating 2 random bits using tlie random-stream algorithm. 



from R. If this symbol is H we add HH to X, otherwise we 
add TT to X. 

After reading all the elements in L,R and Y, the length of 
the resulting input sequence is 2\L\. Now, we add a to the 
resulting input sequence if a € {H,T}. This leads to the final 
sequence X, which is unique. ■ 

Example 2. Let us consider the status tree in Fig. QJ/j. And 
we know that the root node generates 1 and the first node 
in the second level generates 1. We can have the following 
conclusions iteratively. 

• In the third level, the symbols received by the node with 
label H are H, and the node with label (p does not receive 
any symbols. 

• In the second level, the symbols received by the node with 
label 1 are HTH, and the symbols received by the node 
with label T are T. 

• For the root node, the symbols received are HTTTHT, 
which accords with Example\l\ 

Let / : {H,T}* {0, 1}* be the function of the random- 
stream algorithm. We show that this function satisfies all the 
three conditions described in the introduction. It is easy to see 
that the first condition holds, i.e., for any two sequences x,y ^ 
{H, T}*, f{x) is a prefix of f{xy), hence it generates streams. 
The following two theorems indicate that / also satisfies the 
other two conditions. 

Theorem 2. Given a source of biased coin with unknown 
probability, the random-stream algorithm generates a stream 
of random bits, i.e., for any k > 0, if we stop running 
the algorithm after generating k bits then these k bits are 
independent and unbiased. 

Let Sy with Y G {0,1}'^ denote the set consisting of all 



the binary sequences yielding Y. Here, we say that a binary 
sequence X yields Y if and only if X[l : |X| — 1] (the prefix 
of X with length \X\ — 1) generates a sequence shorter than 

Y and X generates a sequence with F as a prefix (including 

Y itself). To prove that the algorithm can generate random- 
bit streams, we show that for any distinct binary sequences 
^ij^2 S {0,1}'^, the elements in Syi and those in Sy^ are 
one-to-one mapping. The detailed proof is given in Subsection 

luia 

Theorem 3. Given a biased coin with probability p being H, 
let n be the number of coin tosses required for generating k 
random bits in the random-stream algorithm, then 

Itm M . 1 . 

fc-)-oo k H[p) 

The proof of Theorem [3] is based on the fact that the 
random-stream algorithm is as efficient as Peres's algorithm. 
The difference is that in Peres's algorithm the input length 
is fixed and the output length is variable. But in the random- 
stream algorithm the output length is fixed and the input length 
is variable. So the key of the proof is to connect these two 
cases. The detailed proof is given in Subsection III-CI 

So far, we can conclude that the random-stream algorithm 
can generate a stream of random bits from an arbitrary biased 
coin with asymptotically optimal efficiency. However, the size 
of the binary tree increases as the number of input coin tosses 
increases. The longest path of the tree is the left-most path, 
in which each node passes one message to the next node 
when it receives two messages from its previous node. Hence, 
the maximum depth of the tree is logj n for n input bits. 
This linear increase in space is a practical challenge. Our 
observation is that we can control the size of the space by 
limiting the maximum depth of the tree - if a node's depth 
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reaches a certain threshold, it will stop creating new leaves. 
We can prove that this method correctly generates a stream of 
random bits from an arbitrary biased coin. We call this method 
the random-stream algorithm with maximum depth d. 

Theorem 4. Given a source of a biased coin with unknown 
probability, the random- stream algorithm with maximum depth 
d generates a stream of random bits, i.e., for any fc > 0, if we 
stop running the algorithm after generating k bits then these 
k bits are independent and unbiased. 

The proof of Theorem |4] is a simple modification of the 
proof of Theorem |2] given in Subsection III-DI In order to save 
memory space, we need to reduce the efficiency. Fortunately, 
as the maximum depth increases, the efficiency of this method 
can quickly converge to the theoretical limit. 

Example 3. When the maximum depth of the tree is (it 
has only the root node), then the algorithm is approximately 
von Neumann 's scheme. The expected number of coin tosses 
required per random bit is 

1 

pq 

asymptotically, where q = I — p and p is the probability for 
the biased coin being H. 

Example 4. When the maximum depth of the tree is 1, the 
expected number of coin tosses required per random bit is 

1 



pq 



+ Up^ + q2)(2pq) + Up^ + q^) 



asymptotically, where q ~ 1 — p and p is the probability for 
the biased coin being H. 

Generally, if the maximum depth of the tree is d, then we 
can calculate the efficiency of the random-stream algorithm 
by iteration in the following way: 

Theorem 5. When the maximum depth of the tree is d and 
the probability of the biased coin is p of being H, the expected 
number of coin tosses required per random bit is 

1 



Pd{p) 



asymptotically, where Pd{p) can be obtained by iterating 

r,2 



11 P" 

Pd{p) = W+7:Prf-i(p^+g^) + -(p^+g^)pd-i( , 

2 2 + 



(1) 



with q = 1 — p and pa{p) = pq. 

Theorem |5] shows that the efficiency of a random-stream 
algorithm with maximum depth d can be easily calculated by 
iteration. One thing that we can claim is, 

lim pdip) = H{p). 

d^oo 

However, it is difficult to get an explicit expression for pd{p) 
when d is finite. As d increases, the convergence rate of pd{p) 
depends on the value of p. The following extreme case implies 
that pd{p) can converge to H{p) very quickly. 



TABLE I 

The expected number of coin tosses required per random bit 
for different probability p and different maximum depths 



maximum depth 


p=0.1 


p=0.2 


p=0.3 


p=0.4 


p=0.5 





11.1111 


6.2500 


4.7619 


4.1667 


4.0000 


1 


5.9263 


3.4768 


2.7040 


2.3799 


2.2857 


2 


4.2857 


2.5816 


2.0299 


1.7990 


1.7297 


3 


3.5102 


2.1484 


1.7061 


1.5190 


1.4629 


4 


3.0655 


1.9023 


1.5207 


1.3596 


1.3111 


5 


2.7876 


1.7480 


1.4047 


1.2598 


1.2165 


7 


2.4764 


1.5745 


1.2748 


1.1485 


1.1113 


10 


2.2732 


1.4619 


1.1910 


1.0772 


1.0441 


15 


2.1662 


1.4033 


1.1478 


1.0408 


1.0101 


oo 


2.1322 


1.3852 


1.1347 


1.0299 


1.0000 



depths I U 
depth./ 



aepth.'t 
deptFPT" 



depth=2 " 
"depth. 1 " 



0.1 0.2 



0.3 0.4 0.5 0.6 0.7 O.e 
Probability p 



Fig. 2. The efficiency for different probability p and different maximum 
depths. 



Example 5. Let us consider the case that p = i. According 
to Equ. (|7|, we have 

,1, 11 ,1, 1 ,1, 

Pd{^) = ^ + i^Pd^iK^) + -^Pd-i{^), 

where Pq{^) = \. Based on this iterative relation, it can be 
obtained that 

Pd{\)^i-{\Y^'. 

So when p = ^, pd{p) can converge to H{p) — 1 very quickly 
as d increases. 

In Table U we tabulate the expected number of coin tosses 
required per random bit in the random-stream algorithm with 
different maximum depths. We see that as the maximum 
depth increases, the efficiency of the random-stream algorithm 
approaches the theoretical limitation quickly. Let us consider 
the case of p = 0.3 as an example. If the maximum depth is 0, 
the random-stream algorithm is as efficient as von Neumann's 
scheme, which requires expected 4.76 coin tosses to generate 
one random bit. If the maximum depth is 7, it requires only 
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TABLE II 

The expected time for processing a single input coin toss for 
different probability p and different maximum depths 



maximum depth 


p=(). 1 


p=0.2 


p=0.3 


p=0.4 


p=0.5 





1.0000 


1.0000 


1.0000 


1.0000 


1.0000 


1 


1.9100 


1.8400 


1.7900 


1.7600 


1.7500 


2 


2.7413 


2.5524 


2.4202 


2.3398 


2.3125 


3 


3.5079 


3.1650 


2.9275 


2.7840 


2.7344 


4 


4.2230 


3.6996 


3.3414 


3.1256 


3.0508 


5 


4.8968 


4.1739 


3.6838 


3.3901 


3.2881 


7 


6.1540 


4.9940 


4.2188 


3.7587 


3.5995 


10 


7.9002 


6.0309 


4.8001 


4.0783 


3.8311 


15 


10.6458 


7.5383 


5.5215 


4.3539 


3.9599 




Fig. 3. An example for demonstrating Lemma |6] where the input sequence 
for (a) is HTTTHT, and the input sequence for (b) is TTHTHT. 



expected 1.27 coin tosses to generate one random bit. That 
is very close to the theoretical limitation 1.13. However, the 
space cost of the algorithm has an exponential dependence on 
the maximum depth. That requires us to balance the efficiency 
and the space cost in real applications. Specifically, if we 
define efficiency as the ratio between the theoretical lower 
bound and the real value of the expected number of coin tosses, 
then Fig. |2] shows the relation between the efficiency and the 
maximum depth for different probability p. 

Another property that we consider is the expected time for 
processing a single coin toss. Assume that it takes a single 
unit of time to process a message received at a node, then the 
expected time is exactly the expected number of messages that 
have been generated in the status tree (including the input coin 
toss itself). Table Ull shows the expected time for processing a 
single input bit when the input is infinitely long, implying the 
computational efficiency of the random-stream algorithm with 
limited depth. It can be proved that for an input generated by 
an arbitrary biased coin the expected time for processing a 
single coin toss is upper bounded by the maximum depth plus 
one (it is not a tight bound). 

B. Proof of Theorem |2] 

In this subsection, we prove Theorem |2l 

Lemma 6. Let T be the status tree induced by Xa G {H, T}*, 
and let fci, ^2, k\-y-\ be the number of bits generated by the 
nodes in T, where \T\ is the number of nodes in T. Then for 
any yi G {0, 1}*^' with 1 < i < \T\, there exists an unique 
sequence Xb G {H, T}* such that it induces the same status 
tree T, and the bits generated by the ith node in T is yi. For 
such a sequence Xb, it is a permutation of Xa with the same 
last element. 

Proof: To prove this conclusion, we can apply the idea of 
Lemma [T] It is obviously that if the maximum depth of T is 
zero, then the conclusion is trivial. Assume that the conclusion 
holds for any status tree with maximum depth at most fc, then 
we show that it also holds for any status tree with maximum 
depth k + l. 

Given a status tree T with maximum depth fc + 1, we use 
Ya G {0,1}* to denote the binary sequence generated by the 



root node based on Xa, and La, Ra to denote the sequences 
of symbols received by its left child and right child. According 
to our assumption, by flipping the bits generated by the left 
subtree, we can construct an unique sequence Lb G {H,T}* 
uniquely such that Ls is a permutation of La with the same 
last element. Similarly, for the right subtree, we have Rb & 
{H,T}* uniquely such that Rb is a permutation of Ra with 
the same last element. 

Assume that by flipping the bits generated by the root node, 
we get a binary sequence Yb such that \Yb\ — \Ya\ (If the 
label a G {0, 1}, we add it to Ya and Yb), then 

\Lb\ - \Yb\ + \RbI 

which implies that we can construct Xb from Lb, Rb,Yb 
and the label a on the root node uniquely (according to the 
proof of the above lemma). Since the length of Xb is uniquely 
determined by \Lb \ and a, we can also conclude that Xa and 
Xb have the same length. 

To see that Xb is a permutation of Xa, we show that Xb 
has the same number of H's as Xa- Given a sequence X G 
{H,T}*, let wh{X) denote the number of H's in X. It is not 
hard to see that 

W}i{Xa) = wii{La) + wuiRA) + wuia), 

wniXB) = wh{Lb) + WHiRs) + wu{a), 

where wuiLA) — wu{Lb) and wu{Ra) — WHiRs) due to 
our assumption. Hence wh{Xa) = wh{Xb) and Xb is a 
permutation of Xa. 

Finally, we would like to see that Xa and Xb have the 
same last element. That is because if a G {H, T}, then both 
Xa and Xb end with a. If a G {0,0,1}, the last element 
of Xb depends on the last element of Lb, the last element 
of Rb, and a. Our assumption gives that Lb has the same 
element as La and Rb has the same element as Ra. So we 
can conclude that Xa and Xb have the same last element. 

This completes the proof. ■ 

Example 6. The status tree of a sequence HTTTHT is given 
by Fig. \3][a). If we flip the second bit 1 into 0, see Fig. \3![b), we 
can construct a sequence of coin tosses , which is TTHTHT. 

Now, we define an equivalence relation on {H, T}*. 
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Definition 1. Let Ta be the status tree of Xa and Tb be the 

status tree of Xb- Two sequences Xa,Xb G {iJ, T}* are 
equivalent denoted by Xa = Xb if and only ifTA= Tb, and 
for each pair of nodes {u, v) with u G Ta and v G Tb at the 
same position they generate the same number of bits. 

Let Sy with Y G {0,1}'^ denote the set consisting of all 
the binary sequences yielding Y. Here, we say that a binary 
sequence X yields Y if and only if X[l : |X| — 1] generates a 
sequence shorter than Y and X generates a sequence with Y 
as a prefix (including Y itself). Namely, let / be the function 
of the random-stream algorithm, them 

\f{X[l:\X\-l])\ < \Y\, 

f{X) = YA with A G {0, 1}*. 

To prove that the algorithm can generate random-bit streams, 
we show that for any distinct binary sequences Yi , I2 G 
{0, 1}'^, the elements in Syi and those in Sy,^ are one-to-one 
mapping. 

Lemma 7. Let f be the function of the random-stream 
algorithm. For any distinct binary sequences Yi, ^2 G {0, 1}'^', 
if Xa G Syi, there are exactly one sequence Xb (z such 
that 

• Xb = Xa. 

• /(X^) = YiA and /{Xb) = Y2A for some binary 
sequence A G {0, 1}*. 

Proof: Let us prove this conclusion by induction. Here, 
we use X'j^ to denote the prefix of Xa of length \Xa \ — 1 and 
use /3 to denote the last symbol of Xa- So Xa = X'^^f]. 

When fc = 1, if Xa G Sq, we can write /{Xa) as OA for 
some A G {0, 1}*. In this case, we assume that the status tree 
of X'^ is TX, and in which node u generates the first bit 
when reading the symbol /3. If we flip the label of u from to 
1, we get another status tree, denoted by T^. Using the same 
argument as Lemma [T] we are able to construct a sequence 
Xg such that its status tree is Tg and it does not generate any 
bits. Concatenating X'g with /3 results in a new sequence Xb, 
i.e., Xb = X'^/S, such that Xb = Xa and /{Xb) = lA. 
Similarly, for any sequence Xb that yields 1, i.e., Xb G Si, 
if /{Xb) = lA, we can find a sequence Xa G Sq such that 
Xa = Xb and /{Xa) = OA. So we can say that the elements 
in So and Si are one-to-one mapping. 

Now we assume that all the elements in Sy-^ and Sy^ are 
one-to-one mapping for all Yi,Y2 G {0,1}'^, then we show 
that this conclusion also holds for any Yi,l2 G {0,1}'^+^. 
Two cases need to be considered. 

1) Yi,Y2 end with the same bit. Without loss of generality, 
we assume this bit is 0, then we can write Yi = Y{0 and 
Y2 = Y^O. If Xa G Syi, then we can write /{Xa) = Y{A' 
in which the first bit of A' is 0. According to our assumption, 
there exists a sequence Xb G Sy such that Xb = Xa and 
/{Xb) = Y^A'. In this case, if we write /{Xa) = YiA = 
Y(OA, then /{Xb) = Y^A' = Y^OA = Y2A. So such a 
sequence Xb satisfies our requirements. 

If Xa ^ Sy^, that means Y( has been generated before 
reading the symbol (3. Let us consider a prefix of Xa, denote 



by Xj, such that it yields Y{. In this case, /(X^) = Y{ 
and we can write Xa — XaZ. According to our assumption, 
there exists exactly one sequence Xb such that Xb = Xa 
and /{X'g) = Y2. Since Xa and Xb induce the same status 
tree, if we construct a sequence Xb — XbZ, then Xb = Xa 
and Xb generates the same bits as Xa when reading symbols 
from Z. It is easy to see that such a sequence Xb satisfies 
our requirements. 

Since this result is also true for the inverse case, if Yi, Y2 
end with same bit the elements in Sy-^ and Sy^ are one-to-one 
mapping. 

2) Let us consider the case that Yi , Y2 end with different 
bits. Without loss of generality, we assume that Yi — Y(0 and 
Y2 = Yjl. According to the argument above, the elements in 
»S'oo...oo and Sy^o are one-to-one mapping; and the elements 
in ^ocoi and Sy^i are one-to-one mapping. So our task is 
to prove that the elements in 6*00.. 00 and 5oo...oi are one-to- 
one mapping. For any sequence Xa G Sqo,,,oo, let be 
its prefix of length \Xa \ — 1. Here, generates only zeros 
whose length is at most k. Let denote the status tree of 
X'j^ and let u be the node in that generates the k + 1th 
bit (zero) when reading the symbol /?. Then we can construct 
a new sequence X'^ with status tree such that 

• Tg and are the same except the label of u is and 
the label of the node at the same position in is 1. 

• For each node u in T^, let v be its corresponding node at 
the same position in Tg, then u and v generate the same 
bits. 

The construction of X'^ follows the proof of Lemma |6] If we 
construct a sequence Xb — Xg(3, it is not hard to show that 
Xb satisfies our requirements, i.e., 

• Xb = Xa', 

m X'g generates less than fc + 1 bits, i.e., \/{X'g)\ < k; 
. If /{Xa) = O'^OA, then /{Xb) = O'^IA, where 0'= is 
for k zeros. 

Also based on the inverse argument, we see that the ele- 
ments in 5*00.. 00 and 5oo...oi are one-to-one mapping. So if 
Yi , Y2 end with different bits, the elements in Sy-^ and Sy^ 
are one-to-one mapping. 

Finally, we can conclude that the elements in Sy-^ and Sy^ 
are one-to-one mapping for any Yi, Y2 G {0, l}'^ with fc > 0. 

This completes the proof. ■ 

Theorem |2j Given a source of biased coin with unknown 
probability, the random-stream algorithm generates a stream 
of random bits, i.e., for any k > 0, if we stop running 
the algorithm after generating k bits then these k bits are 
independent and unbiased. 

Proof According to Lemma |7] for any Yi, Y2 G {0, 1}'^, 
the elements in Sy^ and 5*^2 are one-to-one mapping. If two 
sequences are one-to-one mapping, they have to be equivalent, 
which implies that their probabilities of being generated are 
the same. Hence, the probability of generating a sequence in 
Sy^ is equal to that of generating a sequence in . It implies 
that Yi and Y2 have the same probability of being generated for 
a fixed number fc. Since this is true for any Yi, Y2 G {0, l}'^, 
the probability of generating an arbitrary binary sequence Y G 
{0, 1}''' is 2^^ . Finally, we have the statement in the theorem. 
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This completes the proof. ■ 
C. Proof of Theorem |3] 

Lemma 8. Given a stream of biased coin tosses, where the 
probability of generating H is p, we run the random-stream 
algorithm until the number of coin tosses reaches I. In this 
case, let m be the number of random bits generated, then for 
any e, (5 > 0, ;/ / is large enough, we have that 



where 



lH{p) 



H{p) = -_plog2p- (1 -_p)log2(l -p) 



is the entropy of the biased coin. 

Proof: If we consider the case of fixed input length, 
then the random-stream algorithm is as efficient as Peres's 
algorithm asymptotically. Using the same proof given in [5J 
for Peres's algorithm, we can get 

lin. ^ = Hip). 

Given a sequence of coin tosses of length we want to 
prove that for any e > 0, 



„,ni — E\m] 
lim P\ — \ ^ < -e 



E\m] 



0. 



To prove this result, we assume that this limitation holds 
for e = ei, i.e., for any (5 > 0, if Hs large enough, then 

m - E[m] 
t<j[ni\ 

Under this assumption, we show that there always exists £2 < 
€1 such that the limitation also holds for e = 62- Hence, the 
value of e can be arbitrarily small. 

In the random-stream algorithm, / is the number of symbols 
(coin tosses) received by the root. Let mi be the number of 
random bits generated by the root, TO(;) be the number of 
random bits generated by its left subtree and mj^) be the 
number of random bits generated by its right subtree. Then 
it is easy to get 

m = TOi + m(;) + m(r). 

Since the mi random bits generated by the root node are 
independent, we can always make I large enough such that 

Pl'^^^^flj^ < -1/2] < S/3. 
E[mi\ 

At the same time, by making I large enough, we can make 
both m(i) and m(r) large enough such that (based on our 
assumption) 

,171(1) — E\m(i)] 
pj_W^ -T^ < -ei] < <5/3 



and 



P 



E[m(^r)] 



< -ei] < S/3. 



Based on the three inequalities above, we can get 

P[m - E[m] < -ei{^^ + E[m,^i^] + E[m^,)])] < S. 
If we set 



£2 = £1- 



'-^ + E[m(^i)] + E[m(^r)] 
E[mi + m(i) + mi^r)] 



then 



m - E[m] 
E\m\ 



Given the probability p of the coin, when / is large, 

E[mi] = e{E[m]),E[nni^] = e{E[m]), E[m^,^] = e{E[m]), 

which implies that £2 < ei. 

So we can conclude that for any e > 0, (5 > 0, if Hs large 
enough then 

TO - E[m\ 
E[m\ 

And based on the fact that E[m] — > lH{p), we get the result 
in the lemma. ■ 

Theorem |3j Given a biased coin with probability p being H, 
let n be the number of coin tosses required for generating k 
random bits in the random-stream algorithm, then 

E[n] 1 



lim 

k~>oo k 



HipY 



Proof: For any e. (5 > 0, we set Z = 77^(1+*^)' according 
to the conclusion of the previous lemma, with probability at 
least 1 — 6, the output length is at least k if the input length I is 
fixed and large enough. In another word, if the output length 
is k, which is fixed, then with probability at least I ~ 6, the 
input length n < I. 

So with probability less than 6, we require more than I coin 
tosses. The worst case is that we did not generate any bits for 
the first I coin tosses. In this case, we need to generate k more 
random bits. As a result, the expected number of coin tosses 
required is at most I + E[n]. 

Based on the analysis above, we derive 



then 



E[n] < {l~6)l + i6)il + E[n]), 
1 ^ ' fc (l + £) 

E[n\ < — 



1 - 6 H{p) (1 - (5) ■ 

Since e, 5 can be arbitrarily small when I (or k) is large 
enough 

lim < —r^. 

k^oo k Elyp) 

Based on Shannon's theory ||2|, it is impossible to generate 
k random bits from a source with expected entropy less than 
k. Hence 

lim E[nH(j))\ > fc, 



I.e., 



lim > -—. 

k^oo k tl{p) 



Finally, we get the conclusion in the theorem. This com- 
pletes the proof. ■ 
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D. Proof of Theorem |?] 

The proof of Theorem |4] is very similar as the proof of 
Theorem|2] Let Sy with Y G {0, l}*^ denote the set consisting 
of all the binary sequences yielding Y in the random-stream 
algorithm with limited maximum depth. Then for any distinct 
binary sequences Yi,Y2 G {0,1}'^, the elements in Syi and 
those in are one-to-one mapping. Specifically, we can get 
the following lemma. 

Lemma 9. Let f be the function of the random-stream 
algorithm with maximum depth d. For any distinct binary 
sequences Yi,Y2 G {0,1}'^, if Xa G Sy^, there exists one 
sequence Xb G Sy^ such that 

• Xa = Xb- 

• Let Ta be the status tree of Xa and Tb be the status 
tree of Xb- For any node u with depth larger than d in 
Ta, V be its corresponding node in Tb at the same 
position, then u and v generate the same bits. 

• f{XA) = Yi/S. and /(Xb) = Y2A for some binary 
sequence A G {0, 1}*. 

Proof: The proof of this lemma is a simple modification 
of that for Lemma [T] which is by induction. A simple sketch 
is given as follows. 

First, similar as the proof for Lemma |7] it can be proved 
that: when fc = 1, for any sequence Xa G Sq, there exists one 
sequence Xb G Si such that Xa, Xb satisfy the conditions in 
the lemma, and vice versa. So we can say that the elements in 
5o and Si are one-to-one mapping. Then we assume that all 
the elements in Sy^ and Sy^ are one-to-one mapping for all 
^ii ^2 G {0, 1}'', then we show that this conclusion also holds 
for any Yi, Y2 G {0, 1}^^^. Two cases need to be considered. 

1) Yi,l2 end with the same bit. Without loss of generality, 
we assume this bit is 0, then we can write Yi = F/O and 
Y2 = YiQ. 

If Xa G Sy^, then according to our assumption, it is easy to 
prove the conclusion, i.e., there exists a sequence Xb satisfies 
the conditions. 

If Xa ^ Syi, then we can write Xa = XaZ and Xa G 
Syi- According to our assumption, for the sequence Xa, we 
can find its mapping Xb such that (1) Xa = Xb', (2) Xa, Xb 
induce the same status tree and their corresponding nodes with 
depth larger than d generate the same bits; and (3) /(Xa) = 
Y( and J{Xb) — Yj. If we construct a sequence Xb = XbZ, 
it will satisfy all the conditions in the lemma. 

Since this result is also true for the inverse case, if Yi , Y2 
end with same bit, the elements in Sy^ and Sy^ are one-to-one 
mapping. 

2) Yi, Y2 end with different bits. Without loss of generality, 
we assume that Yi = Y/0 and Y2 = Yjl. According to the 
argument above, the elements in S^kQ and Sy^ are one-to- 
one mapping; and the elements in S^ki and Sy^ are one-to- 
one mapping. So we only need to prove that the elements in 
Squq and S^ki are one-to-one mapping. In this case, for any 
Xa G S'ofc-iQ, let Xa — X'^^f] with a single symbol (3. Then 
X'j^ generates only zeros whose length is at most k. Let Ta 
denote the status tree of X'j^ and let u be the node in Ta that 
generates the k + 1th bit (zero) when reading the symbol /?. 



Note that the depth of u is at most d. In this case, we can 
construct a new sequence X^ with status tree Tb such that 
• Tb and T^ are the same except the label of u is and 

the label of the node at the same position in is 1. 
> For each node u in T^, let v be its corresponding node at 
the same position in Tb, then u and v generate the same 
bits. 

Then we can prove that the sequence Xb = Xg(3 satisfies 
our all our conditions in the lemma. Also based on the inverse 
argument, we can claim that the elements in Sgk^ and 5ofei 
are one-to-one mapping. 

Finally, we can conclude that the elements in Sy-^ and Sy^ 
are one-to-one mapping for any Yi, Y2 G {0, l}'^ with fc > 0. 

This completes the proof. ■ 

From the above lemma, it is easy to get Theorem |4] 

Theorem |4j Given a source of biased coin with unknown 
probability, the random-stream algorithm with maximum depth 
d generates a stream of random bits, i.e., for any k > 0, if we 
stop running the algorithm after generating k bits then these 
k bits are independent and unbiased. 

Proof We can apply the same procedure of proving 
Theorem [3] ■ 

E. Proof of Theorem |5] 

Similar to the proof of Theorem [3] we first consider the 
case that the input length is fixed. 

Lemma 10. Given a stream of biased coin tosses, where the 
probability of generating H is p, we run the random-stream 
algorithm with maximum depth d until the number of coin 
tosses reaches I. In this case, let m be the number of random 
bits generated, then for any e,S > 0, if I is large enough, we 
have that 

m - lpd{p) 



lpd{p) 

where pd(j>) is given in (|7). 



Proof: Let pd{p) be the asymptotic expected number of 
random bits generated per coin toss when the random-stream 
algorithm has maximum depth d and the probability of the 
biased coin is p. Then 

E[m] 
hm — — = pd(p). 

When the fixed input length I is large enough, the random- 
stream algorithm generates approximately lpd{p) random bits, 
which are generated by the root node, the left subtree (subtree 
rooted at root's left child) and the right subtree (subtree 
rooted at the root's right child). Considering the root node, 
it generates approximately Ipq random bits with q = 1 — p. 
Meanwhile, the root node passes approximately | messages 
(H or T) to its left child, where the messages are independent 
and the probability of Wisp^ + q^; and the root node passes 
approximately + q'^) messages (H or T) to its right 

child, where the messages are independent and the probability 
of H is /, .> . As a result, according to the definition of 
Pd, the left subtree generates approximately ^^pd-iij)^ +9^) 
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random bits, and the right subtree generates approximately 

^{p^ + 'f)pd-i{ pF^fji ) random bits. As Z — 5- cxo, we have 



TTHTTHHTT 



Ipdip) 



lim ; , 2 

Ipq + ^Pd-l{p^ + + + g2)^^_^(_^) 

It yields 



= 1. 



1 1 

Pdip) =pq+ + g^) + 7;{p'^ + q^)pd-i{ o , 

So we can calculate pd{p) by iteration. When d — Q, the status 
tree has the single root node, and it is easy to get po {p) — PQ- 
Then, following the proof of Lemma [8] for any e, (5 > 0, if 
I is large enough, we have that 

TO - E[m] 
h[m\ 

So we can get the conclusion in the lemma. This completes 
the proof. ■ 

From the above lemma, we can get Theorem |5] that is. 

Theorem |5j When the maximum depth of the tree is d and 
the probability of the biased coin is p of being H, the expected 
number of coin tosses required per random bit is 

1 

Pd{p) 

asymptotically, where pd{p) can be obtained by iterating 

Pd{p)^pq+\pd-l{p' + q^) + \{p'W)Pd-l{ 2^ J dll) 
2 2 p^ + q'^ 

with q — 1 — p and po (p) = pq. 

Proof We can apply the same procedure of proving 
Theorem |2] except we apply Lemma [TO] instead of Lemma 

m ■ 

III. Generalized Random-Stream Algorithm 

A. Preliminary 

In ifTOl . we introduced a universal scheme for transforming 
an arbitrary algorithm for generating random bits from a 
sequence of biased coin tosses to manage the general source 
of an m-sided die. This scheme works when the input is a 
sequence of fixed length; in this section, we study how to 
modify this scheme to generate random-bit streams from to- 
sided dice. For sake of completeness we describe the original 
scheme here. 

The main idea of the scheme is to convert a sequence with 
alphabet larger than two, written as 

X — XiX2---Xn e {0, 1, TO — 1}", 

into multiple binary sequences. To do this, we create a 
binary tree, called a binarization tree, in which each node is 
labeled with a binary sequence of H and T. Given the binary 
representations of Xi for all 1 < i < n, the path of each 
node in the tree indicates a prefix, and the binary sequence 
labeled at this node consists of all the bits (H or T) following 
the prefix in the binary representations of xi,X2, ■■■,Xn (if it 
exists). Fig. |4] is an instance of binarization tree when the 




THHHHT 



Fig. 4. An instance of binarization tree. 



input sequence is X = 012112210, produced by a 3-sided 
die. To see this, we write each symbol (die roll) into a binary 
representation of length two, hence X can be represented as 

TT,TH,HT,TH,TH,HT,HT,TH,TT. 

Only collecting the first bits of all the symbols yields an 
independent binary sequence 

X^ = TTHTTHHTT, 

which is labeled on the root node; Collecting the second bits 
following T, we get another independent binary sequence 

X-Y = THHHHT, 

which is labeled on the left child of the root node. 

The universal scheme says that we can 'treat' each binary 
sequence labeled on the binarization tree as a sequence of 
biased coin tosses: Let be any algorithm that can generate 
random bits from an arbitrary biased coin, then applying 4* 
to each of the sequences labeled on the binarization tree and 
concatenating their outputs together results in an independent 
and unbiased sequence, namely, a sequence of random bits. 

Specifically, given the number of sides to of a loaded die, 
the depth of the binarization tree is 6 = [log2 wl — 1- Let Tf, 
denote the set consisting of all the binary sequences of length 
at most b, i.e., 

Tb = {(t>, T, H, TT, TH, HT, HH, HHH...HH}. 

Given X e {0,1, ...,m — 1}", let X^ denote the binary 
sequence labeled on a node corresponding to a prefix 7 in 
the binarization tree, then we get a group of binary sequences 

Xc/,, Xj, Xu, Xjj, Xj}i, XiYT, Xuii, ... 

For any function that generates random bits from a fixed 
number of coin tosses, we can generate random bits from X 
by calculating 

^{X^) + *(Xt) + *(Xh) + *(Xtt) + *(^th) + 

where A + B is the concatenation of A and B. 

So in the above example, the output of X = 012112210 is 
+ *(Xt), i.e., 

(TTHTTHHTT) + * (THHHHT). 

This conclusion is simple, but not obvious, since the binary 
sequences labeled on the same binarization tree are correlated 
with each other. 
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B. Generalized Random-Stream Algorithm 

We want to generalize the random-stream algorithm to 
generate random-bit streams from an m-sided die. Using the 
similar idea as above, we convert the input stream into multiple 
binary streams, where each binary stream corresponds to a 
node in the binalization tree. We apply the random-stream 
algorithm to all these binary streams individually, and for each 
stream we create a status tree for storing state information. 
When we read a dice roll of m sides from the source, we pass 
all the log2 m bits of its binary representation to [log2 to] 
different streams that corresponds to a path in the binalization 
tree. Then we process all these [log2 to] streams from top to 
bottom along that path. In this way, a single binary stream is 
produced. While each node in the binalization tree generates 
a random-bit stream, the resulting single stream is a mixture 
of these random-bit streams. But it is not obvious whether the 
resulting stream is a random-bit stream or not, since the values 
of the bits generated affect their orders. 

The following example is constructed for demonstrating this 
algorithm. 

Let us consider a stream of symbols generated from a 3- 
sided die, 

012112210... 

Instead of storing a binary sequence at each node in the 
binalization tree, we associate each node with a status tree 
corresponding to a random-stream algorithm. Here, we get 
two nontrivial binary streams 

TTHTTHHTT.., THHHHT.. 

corresponding to prefix (p and T respectively. Fig. |5] demon- 
strates how the status trees change when we read symbols 
one by one. For instance, when the 4th symbol 1(TH) is read, 
it passes T to the root node (corresponding to the prefix (j)) 
and passes H to the left child of the root node (corresponding 
to the prefix T) of the binalization tree. Based on the rules 
of the random-stream algorithm, we modify the status trees 
associated with these two nodes. During this process, a bit 
is generated. 

Finally, this scheme generates a stream of bits 010..., where 
the first bit is generated after reading the 4th symbol, the 
second bit is generated after reading the 5th symbol, ... We call 
this scheme as the generalized random-stream algorithm. As 
we expected, this algorithm can generate a stream of random 
bits from an arbitrary loaded die with m > 2 sides. 

Theorem 11. Given a loaded die with to > 2 sides, if we 
stop running the generalized random-stream algorithm after 
generating k bits, then these k bits are independent and 
unbiased. 



Prefix <p 



The proof of the above theorem is given in Subsection llll-CI 
Since the random-stream algorithm is as efficient as Peres's 
algorithm asymptotically, we can prove that the generalized 
random-stream algorithm is also asymptotically optimal. 

Theorem 12. Given an m-sided die with probability distribu- 
tion p = (potPi, ■■■,Pm-i), 1st n be the number of symbols 



Prefix T 



Read (TT) 



Read 1(TH) 



6 





Read 2(HT) 




Read 1(TH) 




H) >o 




Read 1(TH) 





Read 2(HT) 




Fig. 5. The changes of status trees in the generalized random-stream 
algorithm when the input stream is 012112210.... 



(dice rolls) used in the generalized random-stream algorithm 
and let k be the number of random bits generated, then 



lim 



E\n\ 



1 



H(pO,Pl,---,Pm-l) 



where 



m — l ^ 
H{po,pi, ...,pra-l) = Pi log2 — 



1=0 



is the entropy of the m-sided die. 

Proof: First, according to Shannon's theory, it is easy to 



11 



get that 

lim > 777 V- 

Now, we let 



with an arbitrary e > 0. Following the proof of Theorem 7 
in 1 10 1, it can be shown that when k is large enough, the 
algorithm generates more than k random bits with probability 
at least 1 — S with any S > 0. Then using the same argument 
in Theorem [3] we can get 

lim « < - '-±1, 

k^co k H[po,Pl, ...,Pm-l) I - 6 

for any e,S > 0. 

Hence, we can get the conclusion in the theorem. ■ 
Of source, we can limit the depths of all the status trees for 
saving space, with proof emitted. It can be seen that given a 
loaded die of m sides, the space usage is proportional to m 
and the expected computational time is proportional to log m. 

C. Proof of Theorem |77] 

Here, we want to prove that the generalized random-stream 
algorithm generates a stream of random bits from an arbitrary 
m-sided die. Similar as above, we let Sy with Y G {0, 1}'° 
denote the set consisting of all the sequences yielding Y. Here, 
we say that a sequence X yields Y if and only if X[l : \X\ — 
1] generates a sequence shorter than Y and X generates a 
sequence with F as a prefix (including Y itself). We would 
Uke to show that the elements in Syi and those in S'y-^ are 
one-to-one mapping if Yi and Y2 have the same length. 

Definition 2. Two sequences Xa,Xb G {0,1,...,to — 1}* 
with m > 2 are equivalent, denoted by Xa = Xb, if and 
only X^ = X!^ for all 7 G Tf,, where X:^ is the binary 
sequence labeled on a node corresponding to a prefix 7 in the 
binalization tree induced by Xa, and the equivalence of X^ 
and X^ was given in Definition [7] 

Lemma 13. HO) Let {X^} with ^ e Tb be the bi- 
nary sequences labeled on the binarization tree of Xa G 
{0,1,..., m — 1}" as defined above. Assume X^ is a per- 
mutation of X^ for all 7 G Tt, then there exists exactly 
one sequence Xb G {0, 1, to — 1}" such that it yields a 
binarization tree that labels {X^} with 7 G Tf,. 

Proof: The proof is provided in [fTOl. ■ 

Lemma 14. Let f be the function of the generalized random- 
stream algorithm, and let Xa be a sequence produced by an 
m-sided die. For any distinct sequences Yi,l2 G {0,1}'^, 
Xa G Syi, there are exactly one sequence Xb (z S'y^ such 
that 

• Xb = Xa. 

• /{Xa) ~ YiA and /{Xb) — Y2A for some binary 
sequence A. 

Proof: The idea of the proof is to combine the proof of 
Lemma |7] with the result in Lemma [13] 



Let us prove this conclusion by induction. Here, we use 
to denote the prefix of Xa of length \Xa\ — 1 and use (3 to 
denote the last symbol of Xa- So Xa = X'^f3. X^ is the 
binary sequence labeled on a node corresponding to a prefix 
7 in the binalization tree induced by X^, and the status tree 
of X^ with 7 G T(, is denoted as T^. 

When /c = 1, if Xa G 5o, we can write /{Xa) as OA. 
In this case, let u in with G T;, be the node that 
generates the first bit 0. If we flip the label of u from to 1, 
we get another status tree ■ Using the same argument in 
Lemma |6] we are able to construct a sequence Xf such that 
its status tree is and it does not generate any bits. Here, 
Xg is a permutation of X^. From X^, , X^, we 
can construct a sequence X'^ uniquely ifollowing the proof of 
Lemma [13] (see 110|). Concatenating X'^ with /3 results in a 
new sequence Xb, i.e., Xb — Xg/S such that Xb = Xa and 
/{Xb) = lA. Inversely, we can get the same result. It shows 
that the elements in Sq and Si are one-to-one mapping. 

Now we assume that the conclusion holds for all Yi,Y2 G 
{0,1}'^, then we show that it also holds for any Yi,l2 G 
{0, Ij'^^^. Two cases need to be considered. 

1) Yi, I2 end with the same bit. Without loss of generality, 
we assume that this bit is 0, then we can write Yi — Y(0 
and Y2 — Y20. If Xa yields Yi, based on our assumption, it 
is easy to see that there exists a sequence Xb satisfies our 
requirements. If Xa does not yield Yi, that means Y( has 
been generated before reading the symbol (3. Let us consider a 
prefix of Xa, denote by Xa, such that it yields Y(. In this case, 
/{X'a) — Y/ and we can write Xa = X'aZ. According to our 
assumption, there exists exactly one sequence Xb such that 
Xb = X4 and /{X'^) = Y2. Since ICa and Xb lead to the 
same binalization tree (all the status trees at the same positions 
are the same), if we construct a sequence Xb = XbZ, then 
Xb = Xa and Xb generates the same bits as Xa when 
reading symbols from Z. It is easy to see that such a sequence 
Xb satisfies our requirements. 

Since this result is also true for the inverse case, if Yi , Y2 G 
{0, 1}'^+^ end with the same bit, the elements in Sy^ and Sy.^ 
are one-to-one mapping. 

2) Let us consider the case that Yi , Y2 end with different 
bits. Without loss of generality, we assume that Yi — Y(0 and 
Y2 — Yjl. According to the argument above, the elements 
in '5'oo...oo and Sy^o are one-to-one mapping; the elements 
in S'oo..oi and Sy^i are one-to-one mapping. So our task is 
to prove that the elements in 6*00.. 00 and 5oo...oi are one-to- 
one mapping. For any sequence Xa G S'oo...oo. let X'a be 
its prefix of length \Xa \ ~ 1. Here, X'a generates only zeros 
whose length is at most k. Let denote one of the status 
trees such that u G is the node that generates that k + 1th 
bit (zero) when reading the symbol /3. Then we can construct 
a new sequence X'^ such that 

> Let {X^^} with 7 G Tb be the binary sequences induced 
by X'g, and let 7^^ be the status tree of X^. The 
binalization trees of X'a and X'g are the same (all the 
status trees at the same positions are the same), except 
the label of u is and the label of its corresponding node 
V in TfP is 1. 
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• Each node u in 7^^ generates the same bits as its 
corresponding node v in T^^ for all 7 e Tb. 

The construction of X'g follows the proof of Lemma [T] and 
then Lemma [T3] If we construct a sequence Xb = Xgf3, it is 
not hard to show that Xb satisfies our requirements, i.e., 

• Xb = Xa', 

• X'g generates less than fc + 1 bits, i.e., \f{X'g)\ < k; 

. If /(Xa) = Y,A = y/OA, then /{Xb) = Y^IA = 
Y2A. 

Also based on the inverse argument, we see that the ele- 
ments in 5*00.. 00 and 5oo...oi are one-to-one mapping. 

Finally, we can conclude that the elements in Sy^ and 
are one-to-one mapping for any Yi, Y2 G {0, 1}'"' with fc > 0. 

This completes the proof. ■ 

Based on the above result and the argument for Theorem |2] 
we can easily prove Theorem [TT] 

Theorem 1111 Given a loaded die with m > 2 sides, if we 
stop running the generalized random- stream algorithm after 
generating k bits, then these k bits are independent and 
unbiased. 

IV. Extension for Markov Chains 

In this section, we study how to efficiently generate random- 
bit streams from Markov chains. The nonstream case was 
studied by Samuelson Q, Blum |[T1 and later generalized by 
Zhou and Bruck f9l. Here, using the techniques developed in 
U'l, and applying the techniques introduced in this paper, we 
are able to generate random-bit streams from Markov chains. 
We present the algorithm briefly. 

For a given Markov chain, it generates a stream of states, 
denoted by xiX2Xz--- £ {si, S2, Sm}*- We can treat each 
state, say s, as a die and consider the 'next states' (the states 
the chain has transitioned to after being at state s) as the results 
of a die roll, called the exit of s. For all s e {si, S2, Sm}, if 
we only consider the exits of s, they form a stream. So we have 
total m streams corresponding to the exits of si, S2, Sm 
respectively. For example, assume the input is 

X = S1S4S2S1S3S2S3S1S1S2S3S4S1... 

If we consider the states following si, we get a stream as 
the set of states in boldface: 

X = SiS4S2SlS3S2S3SlSiS2S3S4Sl--- 

Hence the four streams are 

34838182. ■., S1S3S3..., S2S1S4..., S2S1... 

The generalized random-stream algorithm is applied to 
each stream separately for generating random-bit streams. 
Here, when we get an exit of a state s, we should not 
directly pass it to the generalized random-stream algorithm 
that corresponds to the state s. Instead, it waits until we 
get the next exit of the state s. In another word, we keep 
the current exit in pending. In the above example, after we 
read S1S4S281S3S283S1S1S2S3S4S1, S4S3S1 has been passed 
to the generalized random-stream algorithm corresponding to 
si, S1S3 has been passed to the generalized random-stream 



algorithm corresponding to S2, - the most recent exit of each 
state, namely S2, 83, S4, si are in pending. Finally, we mix 
all the bits generated from different streams based on their 
natural generating order. As a result, we get a stream of 
random bits from an arbitrary Markov chain, and it achieves 
the information-theoretic upper bound on efficiency. 

Now, we call this algorithm the random-stream algorithm 
for Markov chains, and we describe it as follows. 

Input: A stream X — x 1X2X3... produced by a Markov 
chain, where Xi E S = {si, S2, Sm}- 
Output: A stream of O's and I's. 
Main Function: 

Let $i be the generalized random-stream algorithm for the 

exits of Si for 1 < i < to, and 9i be the pending exit of 

Si for 1 < i < m. 

Set 9i ~ (p for I < i < m. 

for each symbol Xj read from the Markov chain do 
if Xj-i = Si then 
if 6*, 7^ then 

Input 6i to $i for processing, 
end if 
Set 6i = Xj. 
end if 
end for 

Theorem 15. Given a source of a Markov chain with un- 
known transition probabilities, the random-stream algorithm 
for Markov chains generates a stream of random bits, i.e., for 
any k > 0, if we stop running the algorithm after generating 
k bits then these k bits are independent and unbiased. 

The proof of the above theorem is a simple extension of 
the proof for Theorem [TT] Let Sy denote the set of input 
sequences that yield a binary sequence Y. Our main idea is 
still to prove that all the elements in Sy^ and Sy^ are one-to- 
one mapping for aU Yi, Fa e {0, 1}'' with fc > 0. The detailed 
proof is a little complex, but it is not difficult; we only need 
to follow the proof of Theorem [TT] and combine it with the 
following result from ||9l. Here, we omit the detailed proof. 

Lemma 16. Given an input sequence X — xiX2...xn £ 
{si, S2, Sm}^ that produced from a Markov chain, let 
T:i{X) be the exit sequence of Si (the symbols following Si) for 
1 < i < m. Assume that [Ai, A2, A„] is an arbitrary collec- 
tion of exit sequences such that Ai and iTi (X) are permutations 
and they have the same last element for all 1 < i < m. Then 
there exists a sequence X' — x'ix'2...x'p^ £ {si, S2, Sm}^ 
such that x'l = Xi and Tri{X') — A,; for all 1 < i < m. For 
this X', we have x'j^ — xn. 

V. Conclusion 

In this paper, we addressed the problem of generating 
random-bit streams from i.i.d. sources with unknown distri- 
butions. First, we considered the case of biased coins and 
derived a simple algorithm to generate random-bit streams. 
This algorithm achieves the information-theoretic upper bound 
on efficiency. We showed that this algorithm can be gener- 
alized to generate random-bit streams from an arbitrary m- 
sided die with m > 2, and its information efficiency is also 



asymptotically optimal. Furthermore, we demonstrated that by 
applying the (generalized) random- stream algorithm, we can 
generate random-bit streams from an arbitrary Markov chain 
very efficiently. 
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